Method and apparatus for predicting the onset of seizures based on features derived from signals indicative of brain activity

ABSTRACT

This invention is a method, and system for predicting the onset of a seizure prior to electrograph onset in an individual. During an “off-line” mode, signals representing brain activity of an individual (either stored or real time) are collected, and features are extracted from those signals. A subset of features, which comprise a feature vector, are selected by a predetermined process to most efficiently predict (and detect) a seizure in that individual. An intelligent prediction subsystem is also trained “off-line” based on the feature vector derived from those signals. During “on-line” operation, features are continuously extracted from real time brain activity signals to form a feacture vector, and the feature vector is continuously analyzed with the intelligent prediction subsystem to predict seizure onset in a patient. The system, and method are preferably implemented in an implanted device ( 102 ) that is capable of warning externally an individual of the probability of a seizure, and/or automatically taking preventative actions to abort the seizure. In addition, methods are provided for applying intervention measures to an animal to abort or modulat a seizure by adjusting the modality of an intervention measure; and/or parameters of an intervention measure based upon a probability measure indicative of a likelihood of seizure occurrence; and/or a predicted time to seizure onset.

This application claims priority to U.S. Provisional Application No.60/097,580 filed Aug. 24, 1998 and U.S. Provisional Application No.60/129,420 filed Apr. 15, 1999.

FIELD OF THE INVENTION

The present invention is directed to predicting the onset of epilepticseizures, and more specifically to a method and apparatus forautomatically interpreting information representing the activity of thebrain so as to predict the onset of a seizure in order to alert apatient of the possibility of an impending seizure and/or to takepreventative actions to avert a seizure.

BACKGROUND OF THE INVENTION

Epilepsy affects approximately 1% of the population in the United Statesand approximately 2% of the population worldwide. Of those affected bythe disease, approximately one-third have seizures that cannot becontrolled by medication or cured by surgery. Epilepsy surgery requireslocating the region of the brain where seizure onset occurs and thepathways through which the seizures spread, a process that is notcompletely accurate and reliable. Moreover, epilepsy surgery isaccompanied by the inherent risk of neurologic injury, disfigurement andother complications. Some individuals have epileptic seizures thatcannot be controlled by standard medication, are inoperable becauseseizure onset is not localized, or originate from vital areas of thebrain which cannot be surgically removed. These individuals may resortto high doses of intoxicating medications and/or other experimentaltherapies.

Several prior art algorithms for seizure prediction and/or detection areknown. See, for example, U.S. Pat. No. 5,857,978, to Hively et al.,entitled “Epileptic Seizure Prediction by Nonlinear Methods,” U.S. Pat.No. 3,863,625, to Viglione et al., entitled “Epileptic Seizure WarningSystem,” U.S. Pat. No. 4,566,464, entitled “Implantable Epilepsy MonitorApparatus.”

It is desirable to provide a method and apparatus for predictingseizures with such accuracy that the activity of the brain can bemonitored by an implantable device to warn a patient of the likelihoodof an impending seizure, and/or to take preventative actions throughapplication of intervention measures to abort or modulate the seizureprior to clinical onset.

SUMMARY OF THE INVENTION

Briefly, the present invention is directed to a method and apparatus forpredicting the onset of a seizure in an individual. Whereas prior artsystems and algorithms determine that a seizure is occurring afterdetection of its actual electrical onset, which may or may not occurbefore detectable clinical manifestations of a seizure, the presentinvention is directed to a method and apparatus for predicting that aseizure is going to occur sometime well in advance of any detectableelectrical onset or clinical onset of seizure activity. The predictionachieved according to the present invention is well in advance of anyelectrical onset of seizures, or clinical onset, and before there arevisually obvious changes in EEG patterns.

The method and apparatus according to the present invention operate bymonitoring signals representing the activity of the brain, extractingfeatures from the signals and deriving a feature vector representing acombination of those features that are determined (during “off-line”analysis of a particular individual and/or other knowledge of seizureprediction across a number of individuals) to be predictive of seizureonset, and analyzing the feature vector with a trainable algorithmimplemented by, for example, a wavelet neural network, to predictseizure onset. Features are extracted on both an instantaneous basis anda historical basis. Features are collected and analyzed in differenttime frames, such as over days, hours, minutes, and seconds.

Preferably, the system is implemented in an implantable device that anindividual or physician can interface with in much the same manner as animplantable pacemaker or defibrillator. Interface to the implantabledevice is by way of a body-wearable or attachable patient access unitthat includes a display (such as a liquid crystal display), an audibleor visible alert, a vibration alert, and a user interface (such as abutton keypad). The output of the implantable device may comprise asignal(s) indicating a probability of seizure occurrence within one ormore specified periods of time in parallel. The patient may program thesystem via the patient access unit to generate certain levels of alertsbased on programmable probability thresholds. Access may also take placevia connection to a local or physician's office personal computer and toa central facility via the Internet. Programming can be done by thepatients with their personal unit, or the physician may choose tocompletely control this process via periodic checks with an office unit,the patient's home PC or via the Internet, portable cellular, infra-red,microwave or other communication device.

In addition, the system may be programmed to automatically triggerpreventative actions, such as the application of an electrical shock,the delivery of one or more drugs or the activation of a pacingalgorithm which can be employed to abort the seizure or mitigate theseverity of a seizure. Outputs from the device may be used to train thepatient in a biofeedback scheme to learn to abort seizures themselves.

A distinguishing theme of the present invention is that the mostaccurate seizure predictor is one based on the synergy of multiplefeatures or a single feature artificially customized from raw data, asopposed to prior art techniques that involve reliance on a singleconventional feature. Another important aspect of the invention is thegeneration as output of one or more probability measures, eachassociated with a different prediction horizon, that represent thelikelihood a seizure will occur during the corresponding predictionhorizon.

Another aspect of the invention a method for applying interventionmeasures to an animal to abort or modulate a seizure comprising the stepof adjusting the modality of an intervention measure and/or parametersof an intervention measure based upon a probability measure indicativeof a likelihood of seizure occurrence and/or a predicted time to seizureonset.

The above and other objects and advantages of the present invention willbecome more readily apparent when reference is made to the followingdescription taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a stylized timing diagram of an electroencephalographic signalillustrating the distinction between seizure prediction and seizuredetection according to the present invention.

FIG. 2 is a timing diagram showing brain activity signals prior to andat the onset of a seizure.

FIG. 3 is a general block diagram of a system for predicting the onsetof a seizure according to the present invention.

FIG. 4 is a generalized block diagram showing the overall process forpredicting the onset of a seizure according to the present invention.

FIG. 5 is an electrical block diagram showing components of the systemaccording to the invention.

FIG. 6 is a diagram illustrating the creation of neurally computed(artificial or conventional) features according to the system and methodof the present invention.

FIG. 7 is a diagram showing a scheme for analyzing features extractedfrom brain activity signals as a predictor of seizure onset andoutputting a plurality of probability measures each for a correspondingprediction horizon.

FIG. 8 is a graphical diagram illustrating the identification ofpre-seizure or non-pre-seizure events with respect to seizure onset.

FIG. 9 is a functional diagram of a wavelet neural network for analyzinga feature vector and outputting a plurality of probability measures asshown in FIG. 7.

FIG. 10 is a graphical diagram showing the theoretical class conditionalprobability function useful in implementing a predictor using waveletneural networks.

FIG. 11 is a timing diagram of a fractal dimension feature, exemplifyingthe utility of a single feature that may be predictive of seizure onsetin some patients.

FIG. 12 is a timing diagram of an energy feature that can be monitoredfor early prediction of seizure onset in some patients.

FIGS. 13-16 are timing diagrams for multiple features in time, frequencyand chaotic domains, which show a synergy for seizure prediction.

FIG. 17 is a timing diagram of a power feature prior to and during aseizure, and illustrating the enhanced distinctive burst characteristicsleading up to an ictal event.

FIG. 18 illustrates timing diagrams for energy at different timeintervals with respect to seizure activity, and indicates the enhancedfluctuation in energy prior to the seizure in contrast times wellremoved from seizure activity.

FIG. 19 illustrates several time frames of the complex root of aPisarenko related feature preceding a seizure.

FIGS. 20-22 are graphical diagrams showing the trajectory of a threefeatures in a three-dimensional feature space during interictal,pre-ictal and ictal periods of a patient having mesial temporal lobeepilepsy.

FIG. 23 is a graphical diagram illustrating accumulated energy forpre-seizure intervals and baseline intervals for an awake patient.

FIG. 24 is a graphical diagram illustrating accumulated energy forpre-seizure intervals and baseline intervals for an asleep patient.

FIG. 25 is a graphical diagram illustrating spectral entropy for fivepre-seizure intervals and nine baseline intervals for a patient.

FIG. 26 shows graphical diagrams illustrating four types of highfrequency rhythmic prodromes, one of which gives rise to a seizure.

FIG. 27 is a graphical diagram showing that pre-ictal prodromes are rareat times far removed from seizure onset

FIG. 28 is a graphical diagram showing the activity of pre-ictalprodromes increases as a seizure approaches.

FIG. 29 is a graphical diagram that shows the occurrence of pre-ictalprodromes in a single patient prior to six different seizures.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is directed to a system (i.e., method andapparatus) for predicting the onset of a seizure in an individual sothat the individual or attending medical personnel can be warned of animpending seizure in order to prepare for it and/or take preventativeactions to stop the seizure or substantially mitigate it. Furthermore,the present invention is directed to a fully automatic and interactivesystem that can be implanted in and/or worn by a patient to alert apatient of the possibility of an impending seizure so that appropriateaction can be taken. This action may be undertaken either by the patientor a caregiver etc., or by automatically by the system itself.

The terms “individual” and “patient” used herein are meant to includeanimals in general, and particularly humans. The term “animal” is meantto include humans and non-human animals and the present invention mayhave utility in clinical and experimental research on non-human animals.

FIG. 1 illustrates a signal from a single channel of an intracranial EEGand demonstrates the relationship between several important time periods(or events) with respect to the “prediction” of a seizure as opposed tothe “detection” of a seizure, according to the present invention.

Timing Definitions

EO=Electrographic Onset of seizure. The beginning of seizure as markedby the current “gold standard” of expert visual analysis of EEG. EO canbe further divided into EEC (earliest electrographic change), theearliest change in the EEG which could signify a seizure) and UEO(unequivocal electrographic onset), the point at which an electrographicseizure is absolutely clear to an expert electroencephalographer.

AD=Automated Detection of EO. The time when prior art algorithms firstdeclare a seizure, normally after EO due to computational requirements,usage of inappropriate features, or lack of effective features.

CO=Clinical Onset. The time when a clinical seizure is first noticeableto an outside observer who is watching the patient from whom the EEG isrecorded. CO can be further divided into ECC (earliest clinical change)that could signal a seizure onset and the UCO (unequivocal clinicalonset).

AP=Automated Prediction of EO. The time at which an automated algorithm(such as the one according to the present invention) first predictsseizure onset. This will ordinarily be well in advance of any visiblechanges in the EEG or changes in the patient's behavior, andimportantly, prior to EO.

PTOT=Prediction-To-Onset Time=EO minus AP

As is well known in the art, the events EO and CO are known to occurwithin some approximate period of time, and typically are not exactlylocalizable in time.

In accordance with the present invention, and as used hereinafter,seizure prediction means the declaration that a seizure is going tooccur sometime well in advance of any detectable electrical (EO) orclinical onset (CO) of seizure activity. This is shown in FIG. 1 as theevent AP. At EO, AD and CO, the actual seizure has already begun, thoughits clinical expression might not be easily apparent if the appropriatecentral nervous system function is not being tested at the time ofelectrical onset (e.g. the function corresponding to brain in the ictalonset zone). This is to be distinguished from all known prior artalgorithms where brain activity is monitored to determine that a seizureis going to occur after detection of its actual electrical onset, whichmay or may not occur in advance of detectable clinical manifestations ofa seizure. Therefore these known algorithms actually function only asseizure “detectors”, and do not predict that a seizure is likely tooccur. This is also distinguished from prior art prediction algorithmsin that there is no exact time of AP. The present invention, on theother hand, involves generating a probability of prediction continuouslyin different time frames and the threshold declaration of AP isselectable/adjustable by the patient, care taker, physician, insurancecompany, etc.

With reference to FIG. 2, in accordance with the present invention,brain activity signals are continuously monitored in order to detectactivity that is predictive of a seizure. The shaded block shown in FIG.2 is a sliding observation window during which time processing of thebrain activity signal is continuous. The period of time from the rightedge of the observation window to the last instant when a seizure ispharmacologically or electrically preventable is called the predictionhorizon. Beyond the prediction horizon, it is no longer possible tosignificantly deter the onset of the seizure with preventative measuresheretofore known, though it may be possible to reduce or mitigate thefull clinical expression of a seizure after this time. The pre-ictaltime frame for seizure prediction may begin as much as 2-3 hours or moreprior to seizure onset.

According to the present invention, a large set of independent,instantaneous and historical features are extracted from theintracranial EEG, real-time brain activity data and/or other physiologicdata. Once extracted, the features are processed by a predictionalgorithm or intelligent prediction subsystem, such as a wavelet neuralnetwork. The intelligent prediction subsystem looks for synergisticproperties of these features which together predict seizure onset,though each of the features taken individually may not yield this samepredictive information. The feature set is systematically pared down foreach individual patient (during “off-line” analysis) to a subset of coreparameters which yield maximal predictive value, minimal redundancy andminimal computational requirements. This process of adaptive trainingwill take place periodically throughout the life of the device, and thefeature set may be augmented by new or artificially synthesized featuresduring this process. This feature set is represented in vector form, andcalled a feature vector. The feature vector is continuously derived fromthe raw data.

The feature vector is continuously analyzed by the intelligentprediction subsystem as raw data are input into the system. The systemoutputs a probability that a seizure will occur, or if the circumstancesso indicate, that a seizure is occurring (i.e., seizure detection). Thisprocess is one in which the probability output by the system isdynamically updated. At one instant, it may appear the probability of aseizure as high, while at subsequent periods of time, the probabilitymay be determined to be lower. This allows the system to learn thedynamics of seizure prediction (and detection) for a particular patient,and more accurately determine when a seizure is likely to occur.

THE SYSTEM

FIG. 3 illustrates an example of the general architecture of a seizureprediction and control system according to the present invention. Animplantable processing device (also referred to as the implanted unit)102 and an external wearable processor device 104 (also referred to asthe portable unit) are shown. The implanted unit 102 is contained withina bio-compatible housing/enclosure that is implanted in a patient, suchas under a patient's clavicle. The components of the portable unit 104are contained within a housing that is worn on the patient, similar to acellular telephone, pager, etc.

The electrodes 110 detect signals representative of the activity of thebrain. For example, the electrodes 110 may be intracranial electrodes(i.e., depth wires, subdural strips, peg electrodes, etc.); intra-,extra- or trans-vascular electrodes; epidural or bone screw electrodes;scalp electrodes; or other electrodes, such as sphenoidal electrodes, orforamen ovale electrodes. The electrodes 110 may detectelectroencephalogram (EEG) signals, the DC level of EEG signals,electrochemical changes (such as glutamate levels) ormagnetoencephalogram signals. Leads 112 are tunneled under the skin toconnect the electrodes 110 to the circuitry in the implanted unit 102.Other physiologic sensors such as those which monitor heart ratevariability, vagus nerve impulses, brain blood flow, serum chemistry(for example, epinephrine levels), may also be useful to obtainphysiologic signals according to the present invention.

The portable unit 104 may be some form of a device which may combinefeatures of wearable computers, cellular phones, and personal digitalassistants. Alternatively, the system can be configured so that theportable unit 104 is not worn but rather periodically coupled to thepatient for bi-directional data/program transfer. For example, theportable unit 104 can be a type that is placed in a cradle for uploadeddata obtained from the implanted unit.

The link 114 between the implanted unit 102 and the portable unit 104 isa electrical conductor link, optical link, magnetic link, radiofrequency link, sonographic link or other types of wireless links.Depending on the type of link the implanted unit 102 and the portableunit 104 has the appropriate hardware to achieve communication with eachother.

The portable unit 104 is also connectable (using standard cable,docketing station or cradle configurations, or other types of interfacesknown in the art) to a personal computer (PC) 115, a network 116, or toremotely located PCs 117 via the Internet 118. For example, dataobtained from the implanted unit 110 can be stored and periodicallyup-loaded though the interface between the implantable unit 102 and theportable unit 104 during quiet periods far removed from seizures. Inthis way, the implanted unit 102 can have a relatively smaller buffersize. The portable unit 104 may include a hard drive storage devicehaving a storage capacity in the gigabyte range. Similarly, informationcan be downloaded to the portable unit 104 and/or the implanted unit 102from the PC 115, network 116, or remote PCs 117 to adjust variousparameters as will become more apparent hereinafter. The portable unit104 also serves as a user interface for the patient or doctor to setalarm thresholds and other options, and as a data communicationsinterface as explained above. Moreover, all of the functions that couldbe performed directly on the portable unit 104 can also be performedremotely from the PC 115 or remote PCs 117.

Referring to FIG. 4, the process flow according to the present inventionwill be described. At step 200, intracranial EEG signals or otherphysiologic signals are sensed by implanted electrodes or otherappropriate sensors (that may not be implanted) and are pre-processed(amplified, filtered, multiplexed, etc.) by components in the implantedunit. In step 210, a processor preferably in the portable unit extractspremonitory signal characteristics to generate a feature vector(s).Next, in step 220, the feature vector(s) are processed by an intelligentpredictor network, such as a wavelet neural network (implemented ineither software or hardware), that continuously estimates theprobability that a seizure will occur within one or more fixed oradjustable time periods. Examples of time periods are the next 1 minute,10 minutes, 1 hour, and 1 day. The portable unit triggers visualdisplays and auditory cues of this information, and/or commands theimplanted unit to administer abortive and/or mitigative therapy.

The signal processing required to extract the features and performprediction is most likely performed in the implanted unit 102 due to itsproximity to the brain activity or other physiologic signals. However,if the link 114 between the implanted unit 102 and the portable unit 104is a type that can maintain a rapid upload of the physiologic signalsfrom the implanted unit to the portable unit 104, this signal processingcan be performed in the portable unit 104. This is a designconsideration and is not critical to the basic concepts of the presentinvention.

Moreover, for some patients, the algorithmic complexity required forprediction may be such that prediction is achieved in real title on apowerful processor or computer not necessarily located in a miniaturizeddevice (e.g. the implanted unit, although wearable computers arecurrently commercially available at 233 MHz processor speed/4 Gbytestotal storage). The CPU-time hungry processes could be the learningphases and the extraction of some signal features. The probabilityestimation, on the other hand, is virtually instantaneous. Therefore,for the training/learning phase, most of the intelligence can be shiftedaway from the portable unit, if necessary, and into a computerworkstation. The initial training can take place during pre-surgicalevaluation, and periodic retraining can be accomplished duringoutpatient visits by hooking up the portable device to a dockingstation/desktop PC where the intensive programs run. The portable deviceuploads compressed past performance information, offline learning takesplace on the PC, and refreshed parameters are downloaded back into theportable device at during an office visit, remotely via the Internet orvia another type of communication device. The device can optionallycarry out a form of online adaptation that is less demanding. Forfeature extraction processing, memory can be traded off for speed bypre-optimizing artificial features created as wavelet neural network(WNN) models on high-end computers. Feature extractors can then behardwired into the device, such as by way of a WNN.

Turning to FIG. 5, more details of the signal processing and relatedcomponents that make up, in some combination, the implanted unit 102 andthe portable unit 104, will be described. In one embodiment, theimplanted unit 102 comprises signal conditioning circuitry 120, amicroprocessor 130, random access memory (RAM) 132, electronicallyerasable programmable read only memory (EEPROM) 134, ananalog-to-digital (A/D) converter 136, a rechargeable Ni—Cd battery 140and a backup lithium battery 142. In addition, there is impedance checkcircuitry 149 to monitor the impedance of the electrodes to check forelectrode integrity. A software diagnostic routine, executed by themicroprocessor 130, checks for overall system integrity (includingelectrode integrity) at start-up and thereafter on a periodic basis.

The portable unit 104 comprises a keypad 150, a display 152 (such as aLCD), an alarm driver circuit 154 to drive an audible alert device 156,a visible alert-device (LED) 157, a vibration alert device 158, a PCinterface 160, and a telephone/modem interface 162. The PC interface 160facilitates communication with a PC 115 and the telephone/modem/networkinterface 162 facilitates communication with the Internet 118, telephonenetwork (public, cellular or two-way messaging) or local network.Information is passed between the implanted unit 102 and the patientaccess unit 104 via the (data/address/control) bus 146 and over the link114 (FIG. 3).

In addition, a data buffer 164 is included in either the implanted unit102 or the portable unit 104 to collect brain activity or otherphysiologic data to be uploaded. For example, data from pre-ictal(pre-seizure) events are compressed and stored for periodic uploadingeither at a physician's office, via a PC, Internet or telephone, forperiodic training updates.

As explained above, the components of the system 100 that are containedwithin the implanted unit 102 a bio-compatible housing for implantationin a patient may vary. For example, it may prove more practical toinclude the feature analysis component(s) (i.e., the microprocessor 130)in that portion of the system not implanted within the patient.Moreover, all of the components of the system 100 may optionally becontained within a single housing that is implanted in a patient, andthe system is programmed, monitored and tuned remotely by a suitablelink. In this way, the system 100 cannot be accessed by a patient orother person that is unfamiliar and not comfortable with having directaccess to the system 100.

The signal conditioning circuitry 120 performs data compression,amplification, filtering, isolation and multiplexing of the raw datasignals from the electrodes 110. In addition, the signal conditioningcircuitry 120 removes from the raw data signals drastic “artifacts”determined not to originate from brain activity. This is achieved usingwell known artifact rejection technology. After conditioning, thesignals are converted to digital signals by the on-board AID converter136 for temporary storage in the RAM 132 and further processing by themicroprocessor 130. The microprocessor 130, through system parametersand software stored in the EEPROM 134, performs feature extraction andfeature vector formation from the digital signals stored in the RAM 132,and also continuously analyzes/evaluates the feature vector with anintelligent prediction subsystem (implemented through software stored inEEPROM 134 or embodied as a separate network or device) to determine aprobability of whether a seizure is impending (or is occurring) in thepatient.

Alternatively, a digital signal processor (DSP), application specificintegrated circuit (ASIC), field programmable gate array (FPGA), orother processing devices known in the art may be used in place of, or inaddition to the microprocessor, to perform the feature extraction andanalysis functions. It is further envisioned that in certainapplications, all of the signal processing functions (pre-processing andfeature analysis) be performed in a single programmable integrated chipor device.

The intelligent prediction subsystem may be implemented by a trainablenetwork, such as for example, a wavelet neural network (WNN), and istrained with feature vectors to generate an output that consists of aprobability measure within a predetermined period of time. A WNN is aspecial class of neural networks. A neural network is a mathematicalconstruct composed of multiple layers of nodes that are connectedtogether. Each node has an activation function and each connectionbetween two nodes has a weight. The output of each node is a nonlinearfunction of all of its inputs. A neural network learns by approximatinga multidimensional function over a space spanned by the activationfunctions for each node. WNNs are neural networks that employ activationfunctions, which are local and semi-orthogonal. WNNs are unique in thatthey can represent the behavior of a function at various resolutions ofinputs. The efficiency and parallel distribution of computation unitsmake WNNs ideally suited for implementation in a high-speed, portablehardware platform useful in the method and apparatus of the presentinvention.

The intelligent prediction subsystem is trained to minimize the expectedvalue of a performance metric (after thresholding the outputprobability). As an example of one or many suitable performance metrics,a metric called the convexly weighted classification (prediction)accuracy (CWCA) is defined, equal to αCPR+(1−α)CNR, where CPR=fractionof times that the seizure is correctly predicted within the universe ofimminent seizures (called sensitivity) and CNR is the correct negativerate, i.e., it is equal to 1 minus a false alarm rate. The weight ax onCNR depends on false alarm tolerance, where α is adaptively adjusteddepending on an particular patient.

The intelligent prediction subsystem is trained with brain activitydata, such as EEG data, or other physiologic data, “off-line” using aglobal training set of EEG data as well as EEG data for a particularindividual for whom the system will be used. Specifically, in the“off-line” mode, features are extracted and selected using actual brainactivity data for a particular individual to optimize the predictioncapability and to minimize calculation and processing. The intelligentprediction subsystem is then trained based on that feature vector. Oncethe feature vector has been optimized and the intelligent predictionsubsystem trained on that feature vector, the system is ready for“on-line” use for a particular individual. During the “on-line”operation, the system continuously processes real-time brain activitydata from a patient, analyzes the data, makes a declaration of theprobability of seizure onset on several time horizons (or a declarationof seizure onset if a seizure is occurring), and generates one of thepossible outputs described above. Further, while “on-line,” theintelligent prediction subsystem of the system may undergo furtherlearning based on the real-time data to more finely tune to the brainactivity characteristics of a particular individual. In addition, theintelligent prediction subsystem is designed to detect seizures, in theevent of missed predictions, to automatically trigger a warning inresponse to detecting electrical onset of seizures. Patient interactionwith the system in the event of false positive alarms will furtherfacilitate “on-line” learning of the intelligent prediction subsystem.For example, the patient may flag that a seizure has occurred, andbuffered data will be stored and labeled accordingly in the implantedunit. On periodic retraining, these flagged data will be inspected toverify that a seizure has indeed occurred, and then update training ofthe intelligent prediction subsystem will take place to reflect thisoccurrence, if necessary.

Initial training of the device may or may not have to take place at thehospital. In one scheme, the patient is admitted to the hospital,several seizures are recorded, and the device is trained for the firsttime. Subsequent periodic interactions of the device with a remote PCare made to further refine learning based upon periodically buffereddata and events. Changes in anti-epileptic and other medications mayrequire some retraining/learning as well. In a second scheme, theimplanted unit is implanted and the patient is released from thehospital without initial recorded seizures and training. Seizure andpre-seizure data are buffered and periodic training is achieved offlineon remote PC. This scheme may be preferable in some ways becausespontaneous seizures recorded out of hospital may have different signalcharacteristics than those induced by rapid medication taper.

The system is programmable to respond to the output of the intelligentprediction subsystem to take one or more actions. For example, themicroprocessor 130 may output a warning signal to trigger the driveralarm circuit 154 to activate the audible alert device 156, the visiblealert device 157, the vibration alert device 158 and/or display asuitable warning message on the display 152. Cellular telephone and/ore-mail communication of this event may also be made, or datarepresenting the event is stored for later transmission. The intelligentprediction subsystem may provide a continuous output representing aprobability that a seizure is going to occur within a certain timehorizon, or several continuous outputs representing probabilities formultiple time horizons. A warning can also be issued to others externalto the individual patient when probability of a seizure exceeds acertain marked threshold over a certain time period (e.g. sending analert to a child's mother, teacher or physician, etc.)

The system 100 may be programmed (through the keypad 150, for example)to set thresholds for certain alarms, such as display alarms, audiblealarms, and vibration alarms. Once an alarm has been activated, furtherincreases in the probability measure will be indicated by correspondingincreases in alarm duration and intensity. The intelligence forinterpreting the output of the intelligent prediction subsystem withvarious programmable thresholds may alternatively be included in aseparate controller in the patient access unit 104, rather than in theimplanted unit 102.

In addition, the microprocessor 130 may be programmed to activate one ormore preventative therapies. For example, an electrical shock, series ofshocks, pacing signal or particular stimulation patterns can beadministered by a stimulation or shock unit 147 via electrodespositioned at locations in or around the brain known to effectivelyavert a seizure. Electrical shock delivery circuits for generatingsignals of suitable characteristics to prevent seizures are well knownin the art. The shock scheme is one that is an intelligently pacedstimulation as opposed to a thresholded shock or open-loop continuousstimulation (as in vagal nerve stimulator).

Some stimulation routines may be interactively modified in coordinationwith sensed brain activity after the system has predicted (or detected)a seizure based on probability measure output of the intelligentprediction subsystem. Alternatively, a single or multiple drugs ornaturally occurring compound(s) may be automatically delivered into thepatient by a drug delivery device 148 worn by or implanted in thepatient. Body wearable or implanted drug delivery devices are well knownin the art.

The therapeutic actions and the range of intensity of those actions mayvary. For example, the system may be programmed to trigger only a mildtype of intervention in response to a moderate probability warningissued for a long prediction time horizon. On the other hand, the systemmay be programmed to respond to high probability events for a shortprediction time horizon with a more intense intervention. The system maybe programmed to select intervention actions only in response to highprobability-short prediction horizon events, particularly if theintervention that is effective for a particular individual is one thathas significant side effects, such as drowsiness, etc. The continuousprobability outputs, their integral, derivative, and/or any othermathematical derivations thereof may be used to intelligently grade theamount of intervention, particularly if probabilities increase and timehorizons for prediction shorten over time.

The system may include a mechanism for a patient to manually flag when aseizure occurs. For example, a button may be provided on the portableunit to record that a seizure had occurred, even when the system did notpredict it. Brain activity or other physiologic data sensed by theelectrodes may be stored in memory for a predetermined time period priorto the false negative seizure event, to be downloaded (by phone, modem,etc.) to a monitoring center for further analysis. In this way, it ispossible to record false negative predictions and more importantly, toobtain brain activity data that preceded the unpredicted seizure eventso that the system can be retrained to predict the seizure moreaccurately. In addition, by permitting a patient to manually record whathe/she believes is a seizure event, it is possible to diagnose eventsthat the patient thinks are seizures but actually arc not epilepticseizures.

The system may include the capability of communicating with personsother than the patient. For example, a cellular telephone, two-waypager, or other transmitter may be connected or interfaced with theportable unit to send seizure warning signals to a physician, familymember, friend, etc. Similarly, warnings can be sent over the Internet(by way of e-mail or other instant messaging).

EXAMPLES OF USEFUL FEATURES AND HOW THEY ARE EXTRACTED

Features are quantitative or qualitative measures that are distilledfrom raw data and contain relevant information for tasks such asclassification and prediction. In the classical pattern recognitionfield, feature extraction refers to good linear combinations ofvariables. Computational intelligence has given rise to otherinterpretations, such as considering a hidden layer in a neural networkas a nonlinear feature extractor. In the medical field, “features” areoften referred to as “parameters.” In addition, some practitionersequate a feature with a single number (a scalar) while others equate itwith an abstract quality that is measured using several numbers (avector). For purposes of the present invention, a feature is defined asan individual variable. Thus, a “feature vectors” is simply a collectionof features organized in vector form.

A “feature library” is a collection of features which are extracted byalgorithms from raw brain activity data. With reference to FIG. 7, thereare two levels of features: instantaneous and historical. Instantaneousfeatures are computed from observation windows that are essentially 1.25seconds or less in duration. Historical features span longer periods,and are based on the evolution of instantaneous features, as shown inthe FIG. 7. The feature vector is derived from the feature library.

Some examples of instantaneous features include: autoregressivecoefficients, spectral entropy, coherence, cross-covariance, correlationbetween entropies, energy, energy derivative, entropy, filteredamplitude squared, fractal dimension, fourth power indicator (definedhereinafter), mean frequency, nonlinear decorrelation lag, nonlinearenergy operator, number of zero crossings, Pisarenko harmonicdecomposition, power distribution in frequency bands, principalcomponents, principal Lyapunov exponent, real cepstrum, spike(occurrence, amplitude, curvature), third-order spectrum, waveletsubband energy, wavelet compression coefficients, epileptiform dischargecomplexity (a measure of number of peaks, amplitude, frequency contentand morphology of spike waveforms), amount of background disruption(amount of deviation from baseline time and frequency characteristics ofelectrical signals), regional coherence (coherence of activity in afocal brain region compared to that of other regions in the brain) andzero crossings of energy derivative. Since many features are widelyknown in the field, formulas are provided below only for new or lesscommonly known features.

Fourth Power Indicator.${P = {\frac{1}{N}\quad {\sum\limits_{n = 0}^{N - 1}\quad {{E\lbrack n\rbrack}^{4}}}}},\quad {N = 10},\quad {{overlap} = 5},$

where energy derivative dE[n]=E[n]−E[n−1].

Pisarenko Harmonic Decomposition. Absolute value of the first threecoefficients (the next three magnitudes are reflected) of a fifth degreecharacteristic polynomial,${\sum\limits_{i = 0}^{5}\quad {a_{i}z^{- i}}},$

whose roots lie on the unit circle. The roots represent poles of alinear discrete-time system whose impulse response is a sum of sinusoidsidentified from the data sequence x[n]. The vector of coefficients a_(i)is the eigenvector associated with the smallest eigenvalue of the 6×6covariance matrix of the convolution matrix of x[n]. This is virtuallyidentical to the rotation vector associated with the smallest singularvalue of the mean-removed embedding matrix of x[n] (principalstate-space reconstruction with embedding dimension=6 and delay=1). Asmall difference between methods arises from the mean estimates.

Nonlinear Energy Operator. NEO=x ² [n−1]−x[n]x[n−2].

Special Entropy (SE). SE provides a measure of organization in neuralfunction which preliminary experiments suggest may be useful in seizureprediction and detection. As an example, a window length of 30 secondsis useful such that the data for each channel is divided intoconsecutive segments, x_(i), of length N=2160 points with a 46% overlap.

First, the reference spectrum is found from:${P_{i}( \omega_{k} )} = {\frac{1}{N}{{X( \omega_{k} )}}^{2}}$

where X(ω_(k)) is the discrete-time Fourier transform (DTFT):${X( \omega_{k} )} = {\sum\limits_{n = 0}^{N - 1}\quad {{x\lbrack n\rbrack}{\exp ( {{- j}\quad \omega \quad n\quad t} )}}}$

A variety of windowing functions were evaluated to determine the bestmethod for smoothing the processed signal. Ultimately, the periodogramswere smoothed using a Bartlett window. The smoothed periodograms arerepresented by:${S_{i}\quad (\omega)} = {\sum\limits_{u}^{\quad}\quad {w_{u}\quad P_{i}\quad ( \omega_{k - u} )}}$

The coefficients of the Bartlett window are:${{For}\quad n\quad {odd}\text{:}\quad {w\lbrack k\rbrack}} = \{ {{\begin{matrix}{\frac{2( {k - 1} )}{n - 1},} & {1 \leq k \leq \frac{n + 1}{2}} \\{{2 - \frac{2( {k - 1} )}{n - 1}},} & {\frac{n + 1}{2} \leq k \leq n}\end{matrix}{For}\quad n\quad {even}\text{:}\quad {w\lbrack k\rbrack}} = \{ \begin{matrix}{\frac{2( {k - 1} )}{n - 1},} & {1 \leq k \leq \frac{n}{2}} \\{\frac{2( {n - k} )}{n - 1},} & {{\frac{n}{2} + 1} \leq k \leq n}\end{matrix} } $

The spectral entropy is then found to be:

$H = {- {\sum\limits_{k}^{\quad}\quad {S\quad ( \omega_{k} )\quad \log_{2}\quad S\quad ( \omega_{k} )}}}$

Examples of historical features include those obtained from statisticalprocess control charts for detecting special cause variability betweenobserved subgroups: accumulated energy, cumulative sum, exponentiallyweighted moving average (EWMA), histogram, minimax (minimum and maximumof n standardized variables), np-chart (number of “defectives”), r-chart(range), s-chart (standard deviation), and xbar-chart (mean). Subgroupsare obtained by successive nonoverlapping blocks (subgroup windows) ofEEG instantaneous features (individual samples in each subgroup deployedthrough time), with subgroup sample sizes greater than or equal to 1. Asecond kind of subgroup can be obtained from an instantaneoussingle-channel feature applied across multiple channels (individualsamples in each subgroup deployed through space). In its basic form,each point in the chart reduces the subgroup window of a given featureto a single number. The single number is, for example, the mean value offractal dimension, or the standard deviation of energy, or the number ofspikes within the subgroup window. When this number goes outside 3standard deviations (3σ) above or below a center line, an“out-of-control” condition is recorded. The system estimates the centerline and control limits from data under “in-control” (nonpreseizure)conditions.

Accumulated Energy (AE). The AE feature is extracted from the energy ofthe measured IEEG time series. If the IEEG sequence is denoted as x(n),then the instantaneous energy of x(n) is given by: x²(n). Using asliding window, then the energy of the signal becomes an average energy:${E\lbrack n\rbrack} = {\frac{1}{N_{1}}\quad {\sum\limits_{i = {n - N_{1} + 1}}^{n}\quad {x(i)}^{2}}}$

where N₁ is the size of the sliding window expressed in number ofpoints. AE contains historical information, and represents a discreteintegral of the energy over time. It is calculated as follows. From theenergy records obtained from expression above, a new moving averagewindow of several points, such as 10, is slid through the energy recordwith an overlap of 5 points, and a new sequence is derived as thecumulative sum of these values. The equation below summarizes themathematical computation of the AE:${{AE}\lbrack n\rbrack} = {{\sum\limits_{i = {{5{({n - 1})}} + 1}}^{{5{({n - 1})}} + 10}\quad {E\lbrack i\rbrack}} + {{AE}\lbrack {n - 1} \rbrack}}$

In addition to the basic extreme pattern, there arc other patterns incontrol charts that signal anomalies. Most are detectors of“non-randomness” based on counters. Examples include: 2 of 3 consecutivepoints outside 2σ limits, 4 of 5 consecutive points outside 1σ limits,15 consecutive points within 1σ limits, 8 consecutive points on sameside relative to center line, trend of 6 consecutive points increasingor decreasing, 14 consecutive points alternating between increase anddecrease, periodicity, and number of extremes per history window. Binarysequences can be used to flag presence or absence of the patterns, orsequences can be left as “continuous” running counts. The history windowis infinite for EWMA, larger than subgroup windows for counters, andequal to the subgroup window for each mean estimate of a feature.

Pre-ictal Prodromes. Pre-ictal prodromes are specific pre-ictal patternswhich occur on the EEG, either visible to the eye or only discoveredcomputationally, which build prior to and herald seizure onset. They mayincrease in their frequency of occurrence, their amplitude or theirduration as a seizure approaches.

In addition to preexisting features, an optimal set of artificialfeatures customized for a particular patient and/or prediction task canbe constructed. Given a set of features, it is known how to prescribeoptimal classifiers and how to create near-optimal ones empiricallyusing neural networks. However, the power set of these features may notconvey maximum information available in the raw data. The act ofprescribing the features themselves is routinely dismissed as an“art”—an inductive problem guided only by trial-and-error or intuitionof the physics.

The following terminology: feature extraction, feature creation, featureoptimization, feature learning, feature optimization, feature discovery,feature mapping, feature augmentation, feature transformation, andsignal or data projection, appears in the prior art in contexts thatalways boil down to working with the same finite set of pre-chosenfeatures:

(1) selecting a feature subset from a predefined list with methods suchas forward and backward sequential selection, or combinedadd-on/knock-out,

(2) creating features as linear combinations of input features (theclassical definition of feature extraction) such as principalcomponents, or creating feature vectors as linear combinations of rawinputs with methods such as adaptive noise filtering and time-frequencytransforms,

(3) creating features as nonlinear combinations of the input featuresconsidering a hidden layer in a neural network as a nonlinear featureextractor, or joining inputs by algebraic operators.

Recognition rate improvements obtained from these methods stem fromrefining the decision structure by making patterns more obvious, and notfrom creating new information; derived features cannot contain moreinformation than is already hidden in the original set. The art ofspecifying the original features comes from the fact that they aresomehow “chosen” from an infinite list. A heuristic approach is proposedthat amounts to searching in this much larger space of possiblefeatures.

If performance depends so much on input features, the challenge is todecide where to draw the line between the features and the predictorstructure. In the present invention, the line is initially drawn as farback as the raw data. Learned artificial features are customized for thegiven task, and presented to a predictor structure as if they wereconventional features computed procedurally. This is based on thefollowing observation: Since a feature (quantitative orqualitative-turned-quantitative) is obtained from a formula or algorithmthat maps a raw input set into a scalar, then a neural network iscapable of learning and implementing the map.

As shown in FIG. 6, an artificial intelligence (AI) network 200 isemployed to generate the feature vector. The neurally computed featuresare the outputs of feedforward networks, or the stable equilibria ofrecurrent networks, and can mimic conventional features or be artificialfeatures altogether. Recurrent WNNs may yield more compact solutions atthe expense of additional training and stability considerations.

The learning phase required for neural computation of features commandsa great deal of computational resources for a large-sized input array,therefore reaping the full benefits of this method involves thefollowing prototypical situation. Group A has a technical predictionproblem and either the current solution is unsatisfactory or animprovement is sought. A raw database is sent to a high-performancecomputing facility where group B synthesizes a set of artificialfeatures off-line, customized for group A's problem within stipulatedtime and computer capability constraints. The result is downloaded backto group A as a “black box” of pre-optimized features, which are thenneurally computed on-line. By definition, the only way for group A tofurther improve performance (if at all possible) is by looking fordifferent or additional raw measurements.

Artificial features can be synthesized from unsupervised learning,reinforced learning, and supervised learning. For example, forsupervised learning, it is clear that the single best artificial featureis the output of the final predictor itself—it compactly conveys thepremonitory class or probability information—but that is precisely theunknown when the problem is first tackled. An off-line training sessionusing desired targets as outputs taken this far produces a completesolution in a single WNN. In this case, the input feature and theprediction output are the same thing. Typically, however, the off-linesynthesis is subject to time constraints and a sub-optimal output willbe produced. This resulting artificial feature (or feature vector),being already close to the desired solution, is better suited thanconventional ad-hoc features for later training of the predictorstructure chosen by the user. This is somewhat similar to the way inwhich the known Group Method of Data Handling composes the desiredoutput solution by using ever closer partial solutions as inputs.

Neurally computed features are fed as input features to the predictorstructures. Under certain conditions it is advantageous to computefeatures neurally as opposed to procedurally, even when the features arenot artificial. For example, the computation of correlation dimensionD_(c) (a measure of fractal dimension found to be valuable in seizuredetection and prediction) involves many steps: sequentiallyhypothesizing embedding dimensions, computing pairwise distances,extracting and offsetting the IEEE-standard exponent of 32-bit floatingpoint numbers, binning distances to obtain a correlation integral,fitting least-square-error lines to read dimensions off the slopes, andaveraging results to reduce variance. The whole process is atransformation of vectors (e.g., 256-points long), to scalars that areonly valid in the range of about 0.5 to 3.5. This procedure makes theD_(c) feature very impractical for real-time implementation, but aneural version of it is useful in accordance with the present invention.

FEATURE SYNERGY, SUBSET SELECTION AND FUSION

The focus of seizure prediction research to date has been on finding asingle feature (or multiple channels or multiple frequency bands) thatby itself will give off a clear premonitory signal. Retrospectiveexamination of features has shown promise but no perfect consistency hasbeen found for any one feature acting alone in discriminating betweenthe pre-seizure state and baseline EEGs. Since pre-ictal changes in rawEEG are notoriously elusive even to the trainedelectroencephalographer's eye, it is not surprising that any arbitrarysingle feature is not fully predictive. A single feature is a partialdescriptor of underlying EEG, and all that can be seen from its temporalplot is a one-dimensional projection of its amplitude evolution foldingupon itself. Higher dimensions in feature space are required toconsistently detect subtle changes prior to seizure. The presentinvention introduces the use of feature synergism, wherein multiplefeatures of a different nature and singly inconsistent, are togethercombined in a particular manner to increase consistency.

Many of the features in the feature library can be redundantlycorrelated to others, or can be completely irrelevant for the particularprediction task. Furthermore, the use of all features in the libraryplaces a large computational burden on the learning and analysis of thesystem. Therefore, a feature vector comprises a subset of features inthe feature library. There are $\begin{pmatrix}N_{f} \\n\end{pmatrix} = \frac{N_{f}!}{{n!}{( {N_{f} - n} )!}}$

possible ways of choosing n-dimensional feature vectors from theuniverse of N_(f) features, n≦N_(f). This can grow so large thatexhaustive search becomes prohibitive. For example, $\begin{pmatrix}30 \\5\end{pmatrix},\begin{pmatrix}30 \\10\end{pmatrix},\begin{pmatrix}100 \\5\end{pmatrix},\quad {{and}{\quad \quad}\begin{pmatrix}100 \\10\end{pmatrix}}$

yield 142, 506, 30(10)⁶, 75(10)⁶, and 1.7(10)¹³, respectively, ways ofchoosing a feature vector. An expedient strategy to deal with thisexponential explosion is to find the smallest feature subset that“works” through a forward sequential search. Improved versions ofsequential search, such as add-on-knock-out algorithms may be employed.

During the “off-line” analysis, each of the N_(f) features derived fromactual brain activity for an individual, are first individually scoredbased on validation error as explained hereinafter. The scores aresometimes given as distinguishability measures based on Gaussianassumptions about the one-dimensional conditional distributions p(x|S)and p(x|NS) of each feature. However, the features may be multimodal andoverlap in ways that require more than one separatrix point. Thus, thepreferred method is to score features based on performance on actualsystem outputs. After the first round of N_(f) scores, the best one ismade a permanent part of the feature vector. On the second round, thestill unused feature that works best in conjunction with the first oneis found. The process is iterated until n features have been chosen(prefixed or until scores exceed a desired level). This techniquerequires only nN_(f)−n(n−1)/2 scores. The numbers in the previousexample reduce to 140, 255, 490, and 955, respectively. The predictorfound with the best feature subset is deemed the final trained model.Training of the intelligent prediction subsystem is explainedhereinafter.

Feature fusion refers to the way in which features are combined beforereaching a prediction decision. Feature fusion is accomplished bypresenting the features in parallel to the system. In an alternativeembodiment of the invention, features are fused using active perception(See, I. Dar, An Intelligent Sensor Fusion Approach to PatternRecognition with an Application to Bond Validation of Surface MountComponents, doctoral dissertation, Georgia Institute of Technology,September 1996) and Dempster-Shafer theory (See, G. Shafer, AMathematical Theory of Evidence. N.J.: Princeton University Press,1976). To arrive at a prediction, features are presented to thecorresponding one-dimensional WNN classifier one by one. Given the ithfeature x, the output of the WNN predictor is an estimate of theconditional pre-seizure class probability P_(T)(S|x). A mass functioncan be derived from this information and the probability values can beassigned to the singleton classes pre-seizure (S) and nonpreseizure(NS), and zero to all other subsets of the frame of discernment (nulland all). This vector is renormalized, if necessary, so that the sum ofthe masses equals 1 as required in Dempster-Shafer theory. From the2^(nd) feature forward, the mass function represents an accumulation ofevidence between the new evidence presented by the ith feature and allprevious ones via Dempster's rule of combination. The degree ofcertainty (DOC) distribution is computed after presentation of each newfeature. After enough evidence has been processed to reach a preset DOClevel, the classification is the class whose DOC is maximum. The DOCcomputation is explained in H. Kang, J. Chang, I. Kim and G.Vachtsevanos' “An Application of Fuzzy Logic and Dempster-Shafer Theoryto Failure Detection and Identification,” IEEE Proc. 30^(th) Conf.Decision & Control, Brighton, England, pp. 1555-1560, 1991.

WNN PREDICTOR SYNTHESIS

Evidence suggests that there are pre-ictal changes in EEG signals whichherald evolution toward a seizure. Consequently, it is more useful todefine the outputs of the system to indicate an expected time of seizureonset and the degree of confidence or probability that a seizure willoccur within that time period.

For example, as shown in FIG. 7, a prediction horizon can be dividedinto 4 prediction horizons: 1 minute, 10 minutes, 1 hour and 1 day. Aprobability P that a seizure will occur is generated by different WNNstrained for each of the four prediction horizons, where for the 1 minutehorizon, P is 0.5; for the 10 minute horizon P is 0.7; for the 1 hourhorizon, P is 0.4 and for the 1 day horizon, P is 0.2. Thistime-oriented probability measure or predictor is described in moredetail hereinafter in conjunction with FIG. 8. In general, there are Nnumber of WNNs employed, where N is the number of prediction horizonsfor which a probability measure is to be output.

More generally, with reference to FIG. 8, the prediction output isdefined to be the conditional probability P_(T)(S|x), that is, theprobability that one (or more) seizure(s) will occur at any time withinthe next T minutes, given the observed measurements x. This formulationallows for both a “hard” prediction (using a threshold on the output),and a measure of certainty regarding the imminent seizure event (theunquantized output). The WNN learns an estimate of the P_(T)(S|x)function from data even though the desired target probabilities areunknown. All that is required is that the desired outputs be labeled as1 for pre-seizure and 0 for non-pre-seizure (instead of actualprobabilities), and that the WNN be trained using a least-squares errorcriterion with a logistic sigmoid in the output unit. It can be shownthat this amounts to a logistic nonlinear regression that gives anestimate of probability in the output independently of featuredistribution. As shown in FIG. 8, data are labeled as pre-seizure (S)and non-pre-seizure (NS) classes. All 30-minute periods beginning witheach marked electrographic onset are dropped from the database forprediction purposes, since by definition they represent non-predictivedata that corrupts the sought-after dependencies.

From the above considerations, the basic implementation of a T-minuteWNN predictor is a multiple-input, single-output transformation:${{\hat{P}}_{T} = \frac{1}{1 + ^{- u}}},{u = {{\sum\limits_{j = 1}^{M}\quad {c_{j}{\psi_{A_{i},B_{j}}(x)}}} + {c_{l}^{lin}x_{l}} + \ldots + {c_{n}^{lin}x_{n}} + c_{0}^{lin}}},$

 ψ_(A) _(j) _(,b) _(j) (x)=ψ(e,rad (x−b _(j))A _(j)(x−b _(j))^(T)),

ψ(x)=min{max{{fraction (3/2)}(1−|x|), 0}1,}cos({fraction (3/2)}πx),

where x is a row vector of inputs [x₁ . . . x_(n)], b_(j) is atranslation vector associated with the jth wavelet node, A_(j) is asymmetric positive semi-definite squashing matrix, and M is the numberof wavelet nodes. The dependence of this WNN on T is implicit by way ofthe training data set that is used to tune the network parameters A_(j),b_(j), and c.

FIG. 9 shows a functional layout of WNN modules for analyzing a featurevector {X₁ . . . , X_(n)}. Wavelet nodes 300(1)-300(n) connecting toeach output P (with the subscript indicating the number of minutes inprediction horizon) may be shared. If it is chosen to implement theintelligent prediction subsystem without sharing nodes, then the WNNmodule is effectively 4 separate WNNs, each trained on a correspondingprediction horizon. The number of prediction horizons and theircorresponding time interval may vary.

The number of wavelet nodes M is systematically found based on K-meansclusterings of the training data in the input-output space for asuccessively larger number of clusters. Each clustering is assigned ameasure of within- to between-variance of the clusters. The measure isthe inverse of a multidimensional F-ratio,${S = \frac{\sum\limits_{i = 1}^{K}\quad {\sum\limits_{j = 1}^{N_{i}}\quad {{{w_{j}^{i} - {\overset{\_}{w}}_{i}}}^{2}/( {N - K} )}}}{\sum\limits_{i = 1}^{K}\quad {N_{i}{{{{\overset{\_}{w}}_{i} - \overset{\_}{w}}}^{2}/( {K - 1} )}}}},$

where N is the number of exemplars, K is the number of clusters, w_(j)^(i) is an input-output data point [x y] that belongs to the ithcluster, N_(i) is the number of such points in the ith cluster,{overscore (w)}_(i) is the center of the ith cluster, and {overscore(w)} is the grand mean. The number of wavelet nodes is taken to be theminimizer of the S function above.

For any given hypothesized WNN structure, training of the networkparameters A_(j), b_(j), and c is cast as a minimization problem withrespect to the empirical average squared error function${{ASE} = {\frac{1}{N}\quad {\sum\limits_{i = 1}^{N}\quad ( {y_{i} - {\hat{P}}_{T}^{(i)}} )^{2}}}},$

where y_(i) are labels in {0,1}. This criterion is used as a guideduring minimization using the training set; however, care is taken toselect a model that minimizes the expected value of this measure notover the training set, but over all future data. Estimates of the lattercan be obtained in principle from regularization or resamplingtechniques.

From a practical point of view, split-sample validation is by far thesimplest effective technique for preventing overtraining of the networkand thus preserving generalization. The data set is split into atraining set TRN and a validation set VAL (and optionally a test setTST; typical proportions are 60%, 20%, 20%). Training proceeds byminimization of error over TRN while monitoring the error on VAL. Thebest WNN on VAL is recorded at every iteration. Typically, the errorover TRN drops to arbitrarily small values (provided a complex enoughWNN), while the error over VAL first decreases and then increasessteadily. The final network chosen is the one that minimizes the errorover VAL, which is a form of early stopping during training. Note thatminimizing VAL error in this fashion is not the same as overtraining onVAL (which can always be driven to zero). VAL is ideally arepresentative sample of the universe of all future exemplars. There isa bias introduced by this scheme to the extent that VAL deviates fromthis ideal. Using yet another unseen data set TST, a final test isusually run for assessing the generalization error. The actualminimization algorithms employed, such as Levenberg-Marquardt andgenetic algorithms, are well known to those skilled in the art.

In order to obtain binary type alarms, thresholds are set on thecontinuous probability outputs. Alternatively, other methods may besuitable. A classification model can be obtained by quantizing theoutput of a probability model, however, such quantization is most usefulfor gauging the final performance of the probability model. Theclassification model can be trained directly as a classifier with a hardlimiter in place of the sigmoid output unit:

C _(T)(x)=(u),

where u has the same form as that noted in the above equation and H(u)is 1 for u≧0 and 0 otherwise. In this case, the classification modelsynthesis is cast as a minimization problem with respect to theempirical average misclassification error (AME), which is the overallfraction of wrong predictions:${{AME} = {{1 - {OCR}} = \frac{N - N_{CS} - N_{CNS}}{N}}},$

where OCR=overall correct rate, N_(CS)=number of correctly predictedpositives, N_(CNS)=number of correctly predicted negatives, N=totalnumber of seizure and no-seizure examples. The expected value of thisquantity can be minimized using a genetic algorithm and a split-samplevalidation strategy. Other error metrics that assign different weightsto false-alarm rates and prediction-to-onset times (like a negativedetection “delay”) may be used as well.

The following are practical examples showing how to implementprobability estimators using WNNs with synthetic and real data.

In a first experiment, 200 samples of a normally distributed featurewith two different means conditioned on equiprobable states were used totrain a WNN with logistic output. The {0,1} target outputs werepre-warped as—log(1/((1−2ε)y_(i)+ε)−1) (numerical inverse of thelogistic function) to obtain a better initialization from that providedby the equation for S above. A Gauss-Newton method was used to solve thenonlinear least squares problem. Requiring only the linear portion ofthe WNN for this task, the correct probability function was very easilyfound.

Next, the experiment was repeated with an accumulated feature thatresets itself every 10 minutes. Under simulated baseline-state, thefeature increased linearly in the range [1,100]. Under a simulatedpre-ictal state, the feature increased linearly from 1 to 49.5 duringthe first half of time, and from 49.5 to 150 during the second half. Achallenge of this feature is that it behaves identically during both thefirst half of a pre-ictal and any non-pre-ictal period. The conditionaldensity p₁₀(x|S) is uniform with height 1/99 between 1 and 100. Theconditional density p₁₀(x|S) is piecewise uniform with height 1/99between 1 and 49.5, and height 0.005 between 49.5 and 150. Then fromBayes' rule, the theoretical class conditional probability function forthis problem is:${P_{10}( S \middle| x )} = \{ {\begin{matrix}0.5 & {1 \leq x < 49.5} \\0.33 & {49.5 \leq x < 100} \\1 & {100 \leq x \leq 150}\end{matrix}.} $

FIG. 10 shows this function, along with the approximation learned by a4-node WNN. Since the distinguishing behavior of this feature is that itdoubles its slope halfway before seizure and it reaches amplitudes neverseen under baseline, then prediction with 100% certainty can be madewith the theoretical or the WNN model, but the prediction-to-onset time(PTOT) cannot be earlier than 5 minutes. The average case is PTOT=2.5minutes, when the resetting time of the sawtooth exactly matches thestart of the 10-minute preictal period. The worst case is PTOT=0, inwhich case the predictor degrades to a (best possible) seizure detector.

The a priori probability of seizure is estimated as P_(T) ^(TRN)(S) fromthe proportion of pre-seizure examples in the training database. If thisproportion does not reflect the true frequency of occurence P_(T)^(true)(S) in the continuous time line, the estimate of posteriorprobability given by probability models will be distorted. According toBayes' rule, the WNN probability estimator should learn the function${P_{T}^{TRN}( S \middle| x )} = {\frac{{p_{T}( x \middle| S )}{P_{T}^{TRN}(S)}}{{{p_{T}( x \middle| S )}{P_{T}^{TRN}(S)}} + {{p_{T}( x \middle| {NS} )}{P_{T}^{TRN}({NS})}}}.}$

The conditional densities p(x|S) and p(x|NS) could in principle beobtained without regard to the proportion of examples under each classin TRN, and plugged in as two separate WNNs. Due to the denominator,rescaling the estimate P_(T) ^(TRN)(S|x) learned from training data bythe factor P_(T) ^(true)(S)/P_(T) ^(TRN)(S), where the true a prioriestimate is learned over larger patient monitoring periods, is notsufficient either to correct the estimate and or to obtain P_(T)^(true)(S|x).

Dividing numerator and denominator we obtain $\begin{matrix}{{P_{T}^{TRN}( S \middle| x )} = \quad \frac{1}{1 + \frac{{p_{T}( x \middle| {NS} )}{P_{T}^{TRN}({NS})}}{{p_{T}( x \middle| S )}{P_{T}^{TRN}(S)}}}} \\{= \quad \frac{1}{1 + {\exp \{ {\ln \quad \frac{{p_{T}( x \middle| {NS} )}{P_{T}^{TRN}({NS})}}{{p_{T}( x \middle| S )}{P_{T}^{TRN}(S)}}} \}}}} \\{= \quad {\frac{1}{1 + {\exp \{ {- \lbrack {{\ln \quad {p_{T}( x \middle| S )}} - {\ln \quad {p_{T}( x \middle| {NS} )}} + {\ln \quad \frac{P_{T}^{TRN}(S)}{P_{T}^{TRN}({NS})}}} \rbrack} \}}}.}}\end{matrix}$

It is evident that the task of the WNN prior to the logistic output unitis to approximate the term between square brackets: a log-likelihoodratio function plus a bias. Therefore to correct P_(T) ^(TRN)(S|x) afterhaving already trained the network, it is not necessary to retrain orscale the output, but rather go inside the WNN and replace the constantbias c₀ ^(lin) in the linear combiner unit with ln(P_(T)^(true)(S)/P_(T) ^(true)(NS)). The bias term c₀ ^(lin) in was verifiedto be the correct value (zero) in all our equiprobable experiments.

This observation is particularly important in seizure prediction becauseseizures are relatively rare events and P_(T) ^(true)(S) tends to bevery small (for 2 weeks of data and a 10-minute prediction horizon,P_(T) ^(true)(S)≈0.005). Training the network with such an unbalancedproportion of examples would obscure the very patterns it must payattention to. Without loss of generality, a balanced set can be trained,injecting the unbalanced bias term later, and slowly tuning online ifnecessary.

The last experiment was repeated, this time with real data for alleleven 10-minute pre-seizures available for one of the patients in ourdatabase. Eleven profiles of accumulated energy were computed for thesepre-seizure periods, and eleven additional profiles undernon-overlapping 10-minute baselines (>8 hrs. away from onsets) withrandom starting times. Profiles were subsampled to 120 points. One ofthe profiles under each condition was blindly reserved for testing, andthe remaining ten were used to train a 4-node WNN. The resulting biasterm c₀ ^(lin)=0 was replaced by ln(0.005/(1−0.005))=−5.29 as discussedbefore. FIG. 10 shows that high certainty prediction was possible in 9out of the 11 pre-seizures with no false alarms. The best PTOT case canalways be achieved by monitoring not a single accumulated energy intime, but the entire profile with each slide of the window. Features ofthis profile are then used to train the WNN probability estimator.

By providing as output a time-based probability measure, a patient orphysician may set thresholds for the probability of a seizure over aprediction horizon. Thus, the system can be programmed as to when,whether and how the system will issue an alert. The patient can thentake suitable action to prepare for the seizure such as staying in asafe and familiar environment until the period of high probability(i.e., greater than 50%) passes, alerting a physician, manuallyadministering a drug, etc. In addition, the system is programmable todetermine when, whether and how preventative actions are automaticallytaken to stop or prevent a seizure by way of shock therapy, drugdelivery, etc.

The feature generation and analysis process used in the system andmethod according to the present invention is similar to that used instatistical process control (SPC) for engineering and industrial controlapplications. That is, the methodology of the present invention involvesmonitoring a parameter or statistic (brain activity features) withrespect to a set of thresholds (control limits) in order to distinguishvariability due to common causes as opposed to special causes(abnormalities). Persistent deviation of a parameter outside of itscontrol limits signals a developing change in the process, analogous tothe prediction of a seizure.

EXAMPLES OF USEFUL FEATURES FOR SEIZURE PREDICTION

The following discussion is directed to the utility of various featuresfor predicting the onset of a seizure. Two or more of these features maybe fused into a feature vector to train a intelligent predictionsubsystem to predict a seizure.

FIG. 11 illustrates one representative parameter, fractal dimension, forfour seizures recorded from the same depth electrode in a patient. Aswith other parameters measured or calculated, seizure onsets had acharacteristic appearance, with minor variation. Thus, in severalpatients with mesial temporal onset originating in one region, thecomputational burden for seizure detection and prediction may be reducedby tuning the intelligent prediction subsystem to brain activitycharacteristics specific to particular individuals. This feature is alsouseful to detect seizures with great rapidity and accuracy at the timeof electrical onset.

FIG. 12 illustrates a comparison of signal energy during a ten minuteinterictal period, 24 hours removed from any seizures, to a period oftime leading up to a seizure, eight minutes prior to ictal onset. Twointeresting features recorded during these two periods of time are theamount of total energy and frequency of peaks of energy prior to seizureonset. There are clear bursts of activity approximately two minutesprior to onset. A pre-ictal increase in baseline activity is consistentwith information learned from patients when they seem to know when aseizure is impending. This suggests the utility of a method forpredicting the probability of seizure onset in real-time, based uponaccumulated measures of several parameters, including energy.

FIGS. 13-16 show plots of the time varying discrete wavelet transform(FIG. 13), spectrogram (FIG. 14), energy (FIG. 15), and entropy (FIG.16). The far right (120 sec.) mark indicates seizure onset. Other marksindicate 20 sec. increments prior to seizure onset, up to 2 minutesprior to the ictal event. These plots illustrate both agreement andsynergy of the features at times 40, 60, and 110 secs., corresponding to80, 60 and 10 sec. prior to seizure onset, respectively. The wavelettransform and spectrogram present greater lower frequency densities(dark shaded areas) at these times, which correlate with a peak in theparameterized measure of energy (FIG. 15). Similarly, a positive energypeak and a negative entropy peak (FIG. 16) correlate well as lateprecursors to seizure onset. Combinations of these and other featuresdescribed above may also prove useful.

FIG. 17 is a plot of the fourth power indicator versus time, obtained byraising the energy signal amplitude to the fourth power. This plot moreclearly demonstrates the bursts of power in the signal leading up to theictal event that are not otherwise present at baseline.

FIG. 18 illustrates the plot of signal energy versus time for twoseparate one hour segments in a channel of first visible seizure onset.The top plot is for one hour prior to a patient's seizure. The bottomplot is taken approximately 8 hours away from any seizure activity.These plots indicate that the energy appears to fluctuate more prior tothe seizure, frequency exceeding some limit in the hour prior to theseizure as opposed to other times distant from seizure activity. Thus,these changes may be detected as predictive of seizure onset in as muchas one hour prior to seizure.

Of the features examined for the two-minute horizon, an interestingfeature is the Pisarenko harmonic decomposition, which mathematically isrepresented or described by a fifth order polynomial of the form:

A(z ^(−j))=Σz ⁻¹,

where z⁻¹ is a delay operator.

The roots of this polynomial lie on the unit circle in the complexplane. The impulse response of this model is a sum of sinusoids whichprovides a clean extraction of the alpha rhythm in the EEG signals.

FIG. 19 illustrates the movement of the roots of the model polynomial inthe complex plane at different instants of time leading up to seizureonset. In each plot, the horizontal axis is the real part of the rootand the vertical axis is the imaginary part. There are fixed complexroots for each 256-point window. The window is moved one sample at atime through the signal for 300 window shifts before each plot is drawn.Each plot shown in FIG. 19 shows the poles every 5 samples, where “TTS”means time to seizure.

Of notable significance, for the entire two minutes preceding theseizure, the roots reside in very localized points along the unit circlein the same location as shown in the first seven frames in FIG. 19, thensuddenly the roots begin to spread around the unit circle atapproximately 60 seconds prior to seizure. This occurred for bothseizures recorded from the same patient. These findings were not seen inhomologous, contralateral channels. These results suggestreproducibility in 3 seizures tested, 2 from the same patient, andanother from a second patient.

Referring now to FIGS. 20, 21 and 22, the changes in the trajectory ofthree features in a three-dimensional feature space are shown forinterictal, pre-ictal and ictal states. The feature space consists ofthree features: (1) the mean frequency; (2) the fourth power indicator;and (3) the non-linear energy operator (NEO) duration above a thresholdwhich is set to discriminate interictal, pre-ictal and ictal periods.The threshold for the NEO duration may be set arbitrarily or adaptively.

The data shown in these figures were derived from a 10-minute period ofa single channel from a human depth electrode recording prior to andduring a complex partial seizure of a human patient having mesialtemporal lobe epilepsy. These figures demonstrate the synergy of thesethree features in distinguishing interictal, pre-ictal and ictal states,which is useful in predicting, and if necessary, detecting the onset ofa seizure.

FIG. 20 shows that for most of the interictal period, the combinedfeature trajectory is confined to a narrow power band, with frequencyfluctuation and NEO duration over broad ranges. One brief perioddemonstrates an “escape trajectory” indicated by the arrow in the figurerepresenting a change from baseline conditions. This brief escape frombaseline may represent an “attempt” to generate a seizure underconditions not otherwise conducive to seizure generation andpropagation. Note that the fourth power indicator scale is 1×10¹⁶.

FIG. 21 shows the feature trajectory during a pre-ictal period. Notethat the fourth power indicator scale is 1×10¹⁷. The feature trajectoryin this figure demonstrates three consecutive “escapes” of increasingmagnitude over time, indicated by the arrows, which herald the ictalstate. In real-time viewing, these escape trajectories convey aprogressive instability leading up to the ictal or seizure state. Escapetrajectories begin several minutes prior to electrographic seizureonset.

FIG. 22 illustrates the feature trajectory during the ictal state. Thefourth power indicator scale is 1×10²⁰. The seizure begins with a large“escape loop” followed by a global reduction in energy in the immediatepost-ictal period.

Another promising feature for predicting epileptic seizures 20 to 50minutes prior to EEG onset is accumulated energy. Accumulated energy(AE) was calculated in the region of seizure onset for 13 pre-seizureand 24 baseline recordings obtained from intracranial EEG (IEEG)recordings in 3 patients with mesial temporal lobe epilepsy (MTLE)during evaluation for epilepsy surgery. In all patients, pre-seizure AEdeviated in a statistically significant fashion from trajectoriescalculated during periods far removed from seizure. Patterns ofdeviation differed between sleep and awake states in all patients. Ourresults indicate that AE is a useful feature for predicting seizure inpatients with MTLE, and may complement other features for seizureprediction with different time horizons.

Turning to FIGS. 23 and 24, accumulated energy as an important featurein seizure prediction will be described. The experimental settingunderlying the data shown in FIGS. 23 and 24 is as follows. IEEG datawere collected on a Nicolet 5000 Video-EEG acquisition unit. Data weredigitally sampled at 200 Hz. Bipolar signals were derived fromintracranial depth and strip electrodes to eliminate common modeartifacts, then 60 Hz notch filtering was performed to eliminate linenoise. Thirteen pre-seizure and 24 randomly chosen baseline (≧8 hrs fromseizure) 50-60 minute IEEG segments were analyzed. Sleep/wake cycleswere derived from EEG and patient video data. The AE feature wasextracted from the energy of the measured IEEG time series, as explainedabove.

Of the 13 pre-seizure and 24 baseline intervals analyzed, all but 1pre-seizure and 1 baseline trajectory were linearly separable withinpatients. FIG. 23 presents AE plotted for 5 pre-seizure and 4 baselineintervals for patient 1. Four of five pre-seizure intervals demonstratetrajectories that deviate significantly from the baseline recordings 20or more minutes prior to seizure onset. One pre-seizure intervalcontinues on a “baseline trajectory” until seizure onset. FIG. 24 showsAE plotted for 4 pre-seizure and 9 baseline intervals during sleep forpatient 2. Again, pre-seizure AE trajectories significantly deviatedfrom baseline AE 20 to 50 minutes prior to seizure onset.

With reference to FIG. 25, still another feature is spectral entropy(SE). FIG. 25 shows SE for five pre-seizure intervals and nine baselineintervals for a patient. The down slope on the top five tracingscoincides with seizure onset.

The spectral entropies of intracranial EEG signals were recorded fromsix patients with mesial temporal lobe epilepsy. Sixty minute segmentsof 35 pre-seizure and 50 randomly chosen baselines (6 hours from theseizure) were analyzed from a total of 6 patients by evaluating bipolarchannels in the ictal onset zone, derived from digital IEEG signalsrecorded referentially. Spectral entropies were calculated in a slidingwindow of 30 seconds with 50% overlap.

Significant changes in SE were observed in all of the 6 patientsevaluated. The SE successfully detected the unequivocal electrographiconset (UEO) in all 6 patients and predicted 17 of the 25 seizures in 4of the patients over a range of 1 to 20 seconds prior to UEO. A decreasein SE occurred on or before the UEO indicating increased organization ofactivity prior to and during a seizure.

SE provides a measure of organization in neural function whichpreliminary experiments suggest may be useful in seizure prediction anddetection. In the setting of MTLE, SE may detect synchronization ofactivity in the ictal onset and epileptogenic zones which may beindicative of imminent seizure onset and propagation. Spectral entropyis among a number of promising quantitative features which maysynergistically forecast seizures and help determine a mechanism forictogenesis in MTLE.

Turning to FIGS. 26-29, the utility of prodromes will be described.Pre-ictal Prodromes are specific pre-ictal patterns which occur on theEEG, either visible to the eye or only discovered computationally, whichherald seizure onset. They may increase in their frequency ofoccurrence, their amplitude or their duration as the seizure approaches.FIG. 26 illustrates one example of a prodrome, visible to the eye ashigh frequency rhythmic activity which “evolves” in frequency andamplitude over time. Four prodromes are shown in FIG. 26. The firstthree are self-limited and dissipate. The fourth prodrome gives rise toa seizure.

FIG. 27 demonstrates that this particular prodrome is rare and farremoved from seizures, i.e., it occurs one time. In particular, thisdiagram shows the occurrence of the pre-ictal prodromes during a 26 hourbaseline period, far removed from any seizures.

FIG. 28 demonstrates that the occurrence of this activity increases asseizures approach. In particular, this figure shows the occurrence ofpre-ictal prodromes prior to seizure onset during a 38 hour periodsurrounding seizures. The numbers in parentheses indicate the number ofprodromes detected prior to seizure onset, not including the “terminal”prodrome which actually begins the seizure. Since prodromes cluster nearthe time of seizure onset, they are not all easily seen, and the numberof prodromes prior to each seizure is written in parentheses next toeach seizure line on the graph. In summary, this FIG. 28 shows thatpre-ictal prodromes occur almost exclusively within 3 hours of seizureonset, and are predictive of oncoming seizures. They often clustertogether prior to seizure onset. The lines of amplitude=1 are pre-ictalprodromes. Thick lines demonstrate clusters of prodromes-prior toseizures. Lines of amplitude=5 are seizures (6 in total). The numbers inparentheses represent the number of times these prodromes occurred priorto each seizure.

FIG. 29 depicts the predictive horizon of the prodromes for the samepatient as in FIGS. 26-28, and their time of occurrence relative toseizure onset. The prodromes occurring at the #1 position occur closestto seizure onset. Seizures have a variable number of pre-ictal prodromesranging up to 11 per seizure onset. In this patient, most prodromesbegan on average of 2.5 to 3 hours prior to seizure onset. This figuredemonstrates that in most cases pre-ictal prodromes occur within 3 hoursof unequivocal electrical seizure onset. In this scheme, it is evidentthat the majority of the prodromes occur between 10,000 and 15,000seconds before seizure onset. A stepped treatment scheme, escalating instrength of treatment, can be tied to prodrome detection, which can beeither quantitative, feature driven, or accomplished via patternmatching. A mild intervention might be triggered with detection of asingle prodrome. This intervention is escalated with detection offurther prodromes, as a function of their number, the period of timeelapsed between them, and characteristics of the prodromes themselves,such as their amplitude, duration, and frequency characteristics.

A specific example of this system is as follows. A feature vector for aparticular patient is generated that contains windowed (i.e. calculatedover a particular time window, such as 1.25 seconds) features such asmean frequency, 4th power indicator, a single scale of the wavelettransform; spectral entropy, and signal energy. A complementaryhistorical feature vector is generated that contain counts of theoccurrence of a pre-ictal prodrome in the last “n” time windows (bytemplate matching or frequency/time domain characteristics), counts ofdrops in fractal dimension for the last “n” windows below a certainthreshold, and features of accumulated energy profiles including thelast value and number of slope changes. Both of these feature vectorsare fed into the series of wavelet neural networks and probabilities ofseizure occurrence for each time horizon are continuously calculated.Higher actual probabilities occur when several pre-ictal prodromes aredetected in a 3 hour period, when the trend in accumulated energydeviates by a certain threshold amount from baseline tracings, or whenthe WNNs calculate increased probability of a seizure based on featurebehavior that is not generally visible to the naked eye.

In summary, the present invention is directed to a fully automaticimplantable system (apparatus and method) for monitoring electricalactivity of the brain, extracting a set of (at least one) features fromthe measured brain activity determined, a priori, to be predictive ofseizure onset (in a particular individual, a class of individuals or allindividuals), continuously analyzing the set of features derived fromreal-time brain activity data and other complementary physiologicparameters with an intelligent prediction subsystem trained to predictwhen a seizure in the brain is imminent based on the set of features,and generating an output indicative of the likelihood of seizureoccurrence. The method may further include the step of automaticallyalerting a patient and/or delivering intervention measures(pharmacologically, electrical, etc.) to abort or modulate the seizure.The patient may set predetermined thresholds of probability measures tocontrol when alerts are generated and/or when preventative action istaken. In addition, if seizure prediction is missed, the system willdetect the seizure and appropriate action can be taken by the patient inresponse to a system alert.

The present invention involves a self-learning intelligence structurewhich will download data periodically and improve its own performanceover time. Some of the processing, training and learning may take placeoff-line on a PC (desktop or portable) at a visiting office unit or viathe Internet, cellular telephone network, or other communication medium.

Other features and advantages of the present invention that are new are:

1. Bias adjustment of the outputs to reflect the relatively lowprobability of seizure occurrence over time in most individuals, whichhas the effect of lowering false alarm rates.

2. Artificially creating optimized features for use in conjunction withconventional features as inputs into the probability estimationstructure (i.e., the predictor). These features may be synthesized bythe trainable intelligent structure of the system as it learns.

The system and method according to the present invention provide severalunique features and advantages over known technologies. For example, thepresent invention employs continuous probabilistic forecasting, andcontinuously outputs a probability measure, which is an estimation ofthe exact probability function determined for seizure occurrenceaccording to the prediction methods of the present invention. Inaddition, the present invention employs multiple adjustable predictiontime periods or time frames. Also unique to this invention, therapeuticintervention triggered by this prediction method is adjusted accordingto the probability measure output and/or time horizon to seizure so thatas seizures become closer and more likely, modalities or parameters ofthe intervention measure (duration, strength, etc.), such as a moreaggressive therapy, is triggered to abort the event.

Continuous probability outputs have advantages such as providingempirical degrees of confidence, easy conversion into on-off warningsignals, and use as a continuous control for automatic drug delivery orseizure-mitigating electrical measures. That is, a character of aseizure treatment or intervention measure (such as strength, duration,intensity, etc.) can be based on the continuous probability output, itsintegral, derivative, and/or any of its mathematical function (linear ornonlinear) thereof.

Accordingly, another aspect of the present invention, which has utilityindependent of the method of predicting onset of seizures and estimatingthe probability seizure onset, involves applying intervention measuresto an animal to abort or modulate a seizure comprising by adjusting themodality of an intervention measure and/or parameters of an interventionmeasure based upon a probability measure indicative of a likelihood ofseizure occurrence and/or a predicted time to seizure onset. Thesemethods control the interaction between diagnostic and therapeuticportions of seizure prediction and treatment system. A variety (i.e.,modalities) of intervention measures are applied to abort or modulate aseizure, such as:

1. electrical stimulation to abort a seizure

2. pacing paradigms or patterns of electrical stimulation

3. local infusion of drugs or chemicals such as benzodiazepines,antiepileptic drugs, neurotransimitters or their agnonists andantagonists, behavioral stimuli, the duration and/or intensity of whichis related to a particular neural signal to cancel patterns known toprecede or induce seizures.

For example, the modalities of intervention measures (and parametersthereof) may track algorithms which predict EEG and/or clinical onset ofseizures based upon multiple features of a feature set, such as the EEGand/or a variety of other physiological parameters includingelectrocardiogram and other features derived from it (e.g. heart ratevariability), pupillary diameter, skin resistance, respiratory rate,serum catecholamines.

In this scheme, a monitoring algorithm looks for information in thebiological parameters modeled that signal seizure onset or anapproaching seizure. Based upon previously selected threshold criteria,such as a relatively low probability measure and/or relatively remotetime to seizure onset (prediction horizon), a particular modality andcharacter or parameters of treatment or intervention measure is chosenfor an initial treatment response. If the initial treatment response isineffective, and/or seizure indicators continue to indicate anapproaching seizure (e.g. increasing probability of seizure occurrenceand/or less remote time to seizure onset), subsequent treatmentresponses are escalated, either by escalating a character or parametersof an intervention measure, changing modalities, or a combination ofboth may be chosen (possibly in turn or in combination with initialtreatment responses) in a “stronger” attempt to arrest the developmentof seizures. For example, initial therapy might be a mild pacing currentin the region of seizure onset. Should this fail after a particular timeperiod, such as 10 seconds, the current level of the pacing signals maythen be escalated and/or the frequency of stimulation altered in anattempt to have more efficacy. Should this again fail, a local infusionof a small amount of a chemical agent or drug to abort seizures may betriggered. Should this still fail, further, more aggressive treatment,potentially including other treatment modalities may be initiated.Finally, if electrical and or clinical seizure onset are detected,maximal intensity treatment with a variety of modalities (pacing,electrical shock, drugs, etc.) may be administered in an attempt tominimized clinical effect from the seizure.

Additionally, these intervention measures may be arranged so that themilder therapies, with fewest side effects are administered/aretriggered in response to programmed alarms or thresholds with highsensitivity and lower selectivity, as a higher false positive rate maybe well tolerated in this scenario; that is, treatments with few sideeffects administered far in time from a seizure. As more aggressivetherapy may be required, as a seizure becomes more imminent, other alarmthresholds may be employed which have a much higher specificity, astolerance for false positive and negative alarms may be less welltolerated when triggering therapeutic responses with greater clinicaleffects and greater side effects. Finally, after seizure onset, asdetected by a highly sensitive and selective algorithm, a maximal“seizure-arresting” responsive intervention measure may be triggered.

Some of the activities that are monitored and used for determiningtherapeutic response may be specific EEG patterns, such as increasingcomplexity of interictal epileptiform discharges, increasing disruptionof background activity and/or specific patterns heralding higherprobability of seizure onset, such as pre-ictal prodromes.

In addition, the present invention is directed to a method forpredicting the probability of seizure onset from electrochemicalmeasures of brain activity, based upon detection and categorization of acascade of neurophysiologic changes in the brain which occur over time(from days, hours, minutes and seconds) prior to and at seizure onset,known to lead to clinically significant epileptic seizures.

The methods for predicting seizure onset and for controlling theapplication of intervention measures may be implemented entirely throughsoftware programs(s) executed by a processor on a variety of hardwareplatforms. In this regard, it is to be understood that these softwareprograms may be embodied as a processor readable memory medium storinginstructions, which when executed by a processor, perform the variousprediction and related intervention control steps described above.

The above description is intended by way of example only and is notintended to limit the present invention in any way except as set forthin the following claims.

What is claimed is:
 1. A method for automatically predicting the onsetof a seizure in an animal, comprising steps of: (a) monitoring signalsindicative of the activity of the brain of an animal; (b) extracting aset of features from the signals; (c) analyzing the set of features witha intelligent prediction subsystem; and (d) generating, in response toanalysis of the set of features by the intelligent prediction subsystem,an output indicative of the probability of occurrence of a seizure. 2.The method of claim 1, wherein the step of generating an outputcomprises generating a warning that a seizure is likely to occur.
 3. Themethod of claim 1, wherein the step of generating an output comprisesgenerating a measure of probability that a seizure will occur within apre-identified time period.
 4. The method of claim 3, and furthercomprising the steps of: setting a probability threshold; monitoring theprobability and comparing it with the probability threshold; and issuingan audible and/or visual warning alert when the probability exceeds theprobability threshold.
 5. The method of claim 3, wherein the step ofgenerating an output comprises generating a plurality of probabilitymeasures each for a different time period.
 6. The method of claim 5, andfurther comprising the step of applying an intervention measure, acharacter of which is based the probability measure and/or a predictedtime to seizure occurrence.
 7. The method of claim 1, and furthercomprising the step of applying an intervention measure beginning withan initial response when triggered in response to a relatively lowprobability measure and/or relatively remote predicted time to seizureonset, and escalating a character and/or modality of the interventionmeasure as the probability measure increases and/or predicted time toseizure onset is less remote.
 8. The method of claim 7, wherein the stepof applying an intervention measure comprises applying an interventionmeasure at a maximal intensity and/or combination of modalities when afeature identifies electrographic seizure onset.
 9. The method of claim1, and further comprising the step of applying intervention measurescomprising pharmacological, cardiac pacing and/or electricalpreventative measures to the animal when a seizure is predicted in orderto terminate a seizure prior to its electrical or clinical onset, or toterminate a seizure after onset.
 10. The method of claim 1, wherein thestep (b) of extracting the set of features comprises extracting one ormore instantaneous features.
 11. The method of claim 1, wherein the step(b) of extracting the set of features comprises extracting one or morehistorical features.
 12. The method of claim 11, wherein the step (b) ofextracting the set of features comprises extracting one or morehistorical features using statistical process control techniques. 13.The method of claim 1, wherein the step (b) of extracting the set offeatures comprises artificially generating one or more features from thesignals.
 14. The method of claim 1, and further comprising the step oftraining the intelligent prediction subsystem to predict the onset of aseizure prior to its occurrence for a particular animal from dataincluding signals indicative of the activity of the brain of aparticular animal.
 15. The method of claim 14, and further comprisingthe step of storing the data including signals indicative of theactivity of the brain of a particular animal prior to and during aseizure event of the particular animal.
 16. The method of claim 1,wherein the step (a) of monitoring signals comprises monitoring brainactivity signals and other physiological signals indicative of the brainactivity.
 17. The method of claim 1, wherein the step of generating anoutput comprises generating a continuously updated probability measureindicative of the likelihood of occurrence of a seizure.
 18. The methodof claim 17, wherein the step of generating the probability measurecomprises estimating the exact conditional probability function.
 19. Themethod of claim 17, and further comprising the step of applying anintervention measure, a character of which is based on a mathematicalfunction of the probability measure.
 20. A method for automaticallypredicting the onset of a seizure in an animal, comprising steps of: (a)monitoring signals indicative of the activity of the brain of an animal;(b) extracting a set of features from the signals; (c) forming a featurevector that is a combination of a plurality of features extracted fromthe signals; (d) analyzing the set of features with a intelligentprediction subsystem; and (e) generating an output indicative of thelikelihood of occurrence of a seizure.
 21. A system for predicting theonset of a seizure in an animal, comprising: (a) at least one electrodefor detecting signals indicative of the activity of the brain of animal;(b) a processor coupled to the at least one electrode, the processor:extracting a set of features from the brain activity signals; forming afeature vector that is a combination of a plurality of featuresextracted from the signals; continuously analyzing the set of featureswith a intelligent prediction process; and generating as output a signalindicative of the likelihood of occurrence of a seizure.
 22. A systemfor predicting the onset of a seizure in an animal, comprising: (a) atleast one electrode for detecting signals indicative of the activity ofthe brain of animal; (b) a processor coupled to the at least oneelectrode, the processor: extracting a set of features from the brainactivity signals; continuously analyzing the set of features with aintelligent prediction process; and generating as output, in response tocontinuous analysis of the set of features by the intelligent predictionprocess, a signal indicative of the probability of occurrence of aseizure.
 23. The system of claim 22, wherein a predictor algorithm to beexecuted by the processor is trained “off-line” with data comprisingsignals indicative of brain activity obtained from a particular animalso as to operate “on-line” in real-time on signals obtained by the atleast one electrode coupled to the particular individual.
 24. The systemof claim 22, wherein the processor is contained with an implantable unitfor implantation in a body of an animal.
 25. The system of claim 24, andfurther comprising a portable unit external of the body of the animalthat communicates with the implantable unit via a communication linkthrough the body of the animal, the portable unit comprising an alertdevice, the processor in the implantable unit generating as output asignal that activates the alert device in response to determining onsetof a seizure.
 26. The system of claim 25, wherein the portable unitcomprises a display, wherein the processor generates a warning messagethat is transmitted via the communication link to the display.
 27. Thesystem of claim 25, wherein information stored in the implanted unit isuploaded through the portable unit for transmission to an externalcomputer via a communications network or the Internet.
 28. The system ofclaim 21, wherein the processor generates an output comprising aprobability measure representing the probability that a seizure willoccur within a period of time.
 29. The system of claim 28, wherein theprocessor generates as output a plurality of probability measures, eachfor a different time period.
 30. The system of claim 29, wherein theprocessor generates a signal to cause application of an interventionmeasure, a character of which is based on the probability measure. 31.The system of claim 21, wherein the processor implements a trainableintelligence network as the intelligent prediction subsystem to analyzethe set of features.
 32. The system of claim 31, wherein the processorimplements the intelligent prediction process with a wavelet neuralnetwork (WNN).
 33. The system of claim 21, wherein the processor detectsseizure onset from the set of features and generates a signal forcausing the delivery of an intervention measure whose character is basedupon multiple features of the feature set.
 34. A method of automaticallypredicting the onset of a seizure comprising steps of: (a) extracting aplurality of features from signals indicative of the brain activity ofan animal; (b) examining the plurality of features and selecting asubset of the plurality of features determined to predictive of seizureonset in the individual; (c) training a intelligent prediction subsystemto predict a seizure in the individual based on the subset of features;(d) continuously extracting the subset of features from real-time brainactivity signals of an individual; (e) continuously analyzing the subsetof features with the intelligent prediction subsystem; and (f)continuously generating as output a probability measure that a seizurewill occur within a predetermined period of time.
 35. The method ofclaim 34, wherein the step (f) of continuously generating comprisesgenerating a plurality of probability measures each with respect to adifferent prediction time horizon.
 36. The method of claim 34, whereinthe step (c) of training comprises periodically training the intelligentprediction subsystem based on seizure and baseline data extracted for aparticular animal to maintain performance of the intelligent predictionsubsystem independent of conditions of the particular animal.
 37. Amethod for applying intervention measures to an animal to abort ormodulate a seizure comprising the step of adjusting the modality of anintervention measure and/or parameters of an intervention measure basedupon a probability measure indicative of a likelihood of seizureoccurrence and/or a predicted time to seizure onset.
 38. The method ofclaim 37, wherein the step of adjusting comprises applying anintervention measure beginning with an initial response when triggeredin response to a relatively low probability measure and/or relativelyremote time to seizure onset, and escalating a character and/or modalityof the intervention measure as the probability measure increases and/ortime to seizure onset is less remote.
 39. The method of claim 38,wherein the step of applying an intervention measure comprises applyingan intervention measure at a maximal intensity and/or combination ofmodalities when a feature identifies electrographic seizure onset.